Multi-Player Bandits: A Trekking Approach
نویسندگان
چکیده
We study stochastic multi-armed bandits with many players. The players do not know the number of players, cannot communicate each other and if multiple select a common arm they collide none them receive any reward. consider static scenario, where remains fixed, dynamic enter leave at time. provide algorithms based on novel ‘trekking approach’ that guarantees constant regret for case sub-linear high probability. trekking approach eliminates need to estimate resulting in fewer collisions improved performance compared state-of-the-art algorithms. also develop an epoch-less algorithm requirement time synchronization across provided player can detect presence arm. validate our theoretical using simulation real test-bed experiments.
منابع مشابه
Multi-Player Bandits - a Musical Chairs Approach
We consider a variant of the stochastic multi-armed bandit problem, where multiple players simultaneously choose from the same set of arms and may collide, receiving no reward. This setting has been motivated by problems arising in cognitive radio networks, and is especially challenging under the realistic assumption that communication between players is limited. We provide a communication-free...
متن کاملMulti-Player Bandits Models Revisited
Multi-player Multi-Armed Bandits (MAB) have been extensively studied in the literature, motivated by applications to Cognitive Radio systems. Driven by such applications as well, we motivate the introduction of several levels of feedback for multi-player MAB algorithms. Most existing work assume that sensing information is available to the algorithm. Under this assumption, we improve the state-...
متن کاملQualitative Multi-Armed Bandits: A Quantile-Based Approach
We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for...
متن کاملTransfer Learning in Multi-Armed Bandits: A Causal Approach
Reinforcement learning (RL) agents have been deployed in complex environments where interactions are costly, and learning is usually slow. One prominent task in these settings is to reuse interactions performed by other agents to accelerate the learning process. Causal inference provides a family of methods to infer the effects of actions from a combination of data and qualitative assumptions a...
متن کاملPlaying in stochastic environment: from multi-armed bandits to two-player games
Given a zero-sum infinite game we examine the question if players have optimal memoryless deterministic strategies. It turns out that under some general conditions the problem for twoplayer games can be reduced to the same problem for one-player games which in turn can be reduced to a simpler related problem for multi-armed bandits. Digital Object Identifier 10.4230/LIPIcs.FSTTCS.2010.65
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Transactions on Automatic Control
سال: 2022
ISSN: ['0018-9286', '1558-2523', '2334-3303']
DOI: https://doi.org/10.1109/tac.2021.3077454